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ABSTRACT 
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Abstract 

New multivariate permutation tests are proposed whicb may be 
effectively substituted for Hotelling's test in 
situations connonly arising in educational research. The new 
tests (a) are aistribution-free, (b) provide tests of 
directional as well as nondirectional hypotheses, (c) may be 
tailored for sensitivity to specific treatment effects, and 
(d) may be computed when the number of variables is larger 
than the number of subjects. Comparisons of the power of the 
permutation tests to that of Hotelling's T^ suggest 
substantial advantages in a number of situations. Results 
are interpreted in terms of applications to educational 
research in which multivariate research questions are posed 
but the number of units for analysis are small. 
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Power Properties of Multivariate Permutation Tests 
Relative to Hotelling's T^ in small samples 



The advantages of multivariate statistical tests, 
relative to univariate hypothesis tests have been well 
documented in the methodological literature in education and 
related fields of study (e.g., Stevens, 1986; Huberty & 
Morris, 1989; and Ottenbacher, 1989). However, limitations 
of popular multivariate tests have also been recognized. 

Practical problems associated with multivariate tests 
arise in many research applications when the number of 
observations is limited. A commonly encountered small sample 
situation is when classrooms or schools are used as the unit 
of analysis in applied research or evaluation studies. 
Issues and strategies related to units of analysis have been 
described in Blair and Higgins (1986), Hopkins (1982), and 
Barcikowski (1981). The first problem that arises is the 
fact that the power of multivariate tests in small 
sample research is often limited (Stevens, 1980) , and in 
extreme circumstances (i.e., when the number of observations 
is less than the number of variables) common multivariate 
test statistics cannot be computed. Secondly, the assumption 
of multivariate normality, which underlies most multivariate 
test statistics, is often unjustified with educational data 
(Micceri, 1989). Although the test statistics may be robust 
to violations of this assumption, the number of subjects 
required to be certain of this robustness is of little 
reassurance to researchers dealing with small samples 
(Everitt, 1979; Olson, 1974). Thirdly, multivariate 
procedures are formulated to detect any departures from the 
null hypothesis, and may therefore lack power to detect 



Power and Permutation Tests 

4 

specif ic departures (Meier, 1975; O'Brien, 1984; and Pocock, 
Geller & Tsiatis, 1987). Finally, multivariate tests are 
inherently nondirectional (two-tailed) and do not provj'ie 
the power advantages obtained through the specification of a 
directional hypothesis test when the researcher can 
formulate such a directional hypothesis. 

The objectives of this research are to present an 
alternative statistical methodology to popular multivariate 
testing procedures and to investigate the power properties 
of this method relative to a popular multivariate test (the 
paired samples Hotelling's T^) . The alternative method, 
based upon permutation tests, has the potential to overcome 
the limitations described above. Moreover, the general 
methods described in this research are easily extended to 
the independent samples Hotelling's T^. The remainder of 
this paper consists of a description of the proposed 
permutation tests, a presentation of the results of a study 
designed to compare the power of the new tests to that of 
Hotelling's test, and a brief consideration of the 
implications of these tests for educational researchers. 

Proposed Tests 

The theoretical bases of permutation tests (also knov/n 
as the method of randomization) were developed by Pitman 
(1^37) and Fisher (1966) . Univariate permutation tests are 
relatively well-known and have been described in detail by 
Bradley (1968) and Noreen (1989). In cont.'ast, extensions of 
these procedures to multivariate data analysis have been 
limited (Boyett & Shuster, 1977) . 

In general, the sampling distribution of a multivariate 
permutation test statistic is obtained by computing the 
desired statistic on all possible permutations of the data 
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vectors obtained from the units of analysis in the research 
study. All such permutations are equally likely under the 
null hypothesis of no treatment effect. The probability, 
under a true null hypothesis, of obtaining the value of the 
test statistic calculated from the sample is computed by 
counting the number of such statistics that exceed or equal 
that obtained value, and dividing this count by the total 
number of permutations. 

More formally, x± =(xii, ... ,Xip) and 
Yi "(Yilf fYip) 21^® p-dimensional vectors denoting 

observed values from the ith subject under control and treatment 
conditions, respectively, and =(xii~yii, ... fX^p-y^p) 
denotes the p-dimensional vector of differences that represents 
change between the treatment and control conditions. 
Finally, -di represents the negative of vector di (for 
example, if +di = (-1, 2, 4), then -di = (1, -2, -4). 

The probability level associated with a test statistic 
t, based on the permutation principle, is computed as 
follows. For each of the 2"^ possible assignments of J- or - 
to the n vectors d^, i = 1, ... , n, which are equally 
likely to occur under the null hypothesis, compute the value 
of the test statistic. If to is the value of the statistic 
computed on the original data, and N(to) is the number of 
permutations in which the value of t is greater than or 
equal to to, then the observed (one-tailed) significance 
level of the test is 

p = N(to) / 2" 

Computations of the 2^^ test statistics may be 
prohibitive with moderiite sample sizes and modern computers. 
For this reason, approximate permutation tests (Edgington, 
1969) are used in the power study to be described. The 
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approximate test differs from the exact test in that rather 
than computing all possible 2^* test statistics, a large 
random sample of such statistics are computed. In the power 
study to be described, 1000 such statistics were computed 
for each permutation test. The associated probability for 
the approximate permutation test is computed as 

p = N(to) / M 

where M is the number of random permutations. The difference 
between the exact and approximate permutation methods is 
small when the number of random samples, M, is large. The 
specific multivariate test statistics examined in this study 
are described below. 

The first statistic, tsum, is defined as 

P 

tsum = S tj 
j=l 

where tj denotes the usual one sample t statistic computed 
on the jth element of d. This statistic was examined in one- 
tailed and two-tailed versions that will be referred to as 
tsuml and tsum2, respectively. 

The second test statistic, t|sum|, is defined as 

P 

1 1 sum I = 2 j t j I 
j=l 

where | t j | denotes the absolute value of the one sample t 
statistic computed on the jth element of d. In contrast to 
the tsum statistic, t|sum| yields only a two-tailed test. 
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The final test statistic proposed, tmax, is defined as 

tmax = t j • 

where t j ' is equal to the tj (j = 1, . . . , p) that is 
greatest in absolute value. This statistic was examined in 
one'tailed and two-tailed versions that will be referred to 
as tmaxl and tmax2, respectively. 

The test statistics described above are designed to be 
sensitive to different forms of departure from the null 
hypothesis. Because tsum is the summation of the individual 
univariate t statistics, it should be most efficient in 
detecting treatment effects that bring about general 
increases or decreases across all p variables. Note, 
however, that tsum would not be sensitive to effects that 
bring about increases in some variables and decreases in 
others, because the differences in algebraic signs of the 
univariate t statistics would tend to cancel. For this type 
of treatment effect, the test statistic t|sum| should be 
notably more sensitive. Finally, tmax is designed to detect 
treatment effects that impact only a small subset of 
dependent variables, such as might be seen when student 
attitudes are affected by a treatment but student 
achievement is not affected. The relative success of these 
strategies is assessed in the sections that follow. 

Method 

The Monte Carlo study described in this section was 
designed to compare the power of the five multivariate 
permutation tests to that of Hotelling's T^ under four 
treatment effect models. Data were generated by sampling 
from a multivariate normal distribution with correlations 
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between any two variables j and j • given by 

rjj, = 1 - (j-j«)(l/P) j' « 1, p; j 5: j' 

where p represents the number of variables in the data set. 
In this study, p took the values of 4, 8, 16., 21, 32, and 
48. Data were generated to simulate the di defined above 
with n taking the values of 10 or 25. The code for this 
Monte Carlo study was vritten in FORTRAN making use of a 
number of subroutines from the International Mathematical 
and Statistical Libraries (IMSL, 1987) . 

Four treatment effect models were examined in the 
study. In the first treatment model a constant treatment 
effect (+.5(7) was added to all variables, where a represents 
the standard deviation of the marginal distributions. This 
simulates an effect in which all dependent variables are 
increased by the treatment. From the point of view of an 
ANOVA design, this effect represents a main effect due to 
treatment . 

The second treatment model was obtained by adding .5a 
to half of the dependent variables and subtracting .5a from 
the other half. This represents an effect in which some 
dependent variables are affected by the treatment in a 
positive direction, while others are affected in a negative 
direction. For example, a hypothetical treatment may yield 
an increase in student achievement, but a decrease in 
student attitudes. From the perspective of ANOVA, this 
represents a disordinal interaction. 

The third treatment model was obtained by adding .5a to 
one-fifth of the dependent variables and leaving the other 
four-fifths unchanged. This represents an effect in which 
only a small proportion of the dependent variables are 
affected by the treatment. An example in educational 
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research is a hypothetical treatment that affects only some 
measures of student achievement (perhaps students* 
acquisition of basic skills) but does not affect students* 
higher-order thinking skills or attitudes. 

The last treatment model examined was obtained by 
adding (j)(.5/p)a to the jth dependent variable, with j 
taking the values 1 to p. This represents an effect in which 
all of the dependent variables are affected by the 
treatment, but the magnitude of the effect is variable. In 
educational research, a hypothetical treatment may strongly 
affects students' acquisition of basic skills, but affect 
students' higher-order thinking skills and attitudes to a 
much lesser extent. From the perspective of ANOVA, this 
represents an ordinal interaction. 

In addition to the four treatment effects studied, a 
null model was investigated. Because the permutation tests 
are distribution-free and the assumption of population 
normality that underlies Hotelling's T^ test was met, this 
model served to verify the FORTRAN program used to carry out 
the simulations. 

Simulations were carried out for situations in which 
the sample sizes were 10 and 25, and the number of dependent 
variables ranged from 4 to 48. For this study, the sampling 
distributions of the p mutation statistics (and, hence, the 
decisions to reject or fail to reject the null hypothesis) 
were based on 1,000 random permutations of each sample. The 
Type I error and power estimates were based on 5,000 samples 
from each experimental condition. Two-tailed tests of 
significance were carried out at .10, .05, and .01 levels 
for all of the test statistics. In addition, one-tailed 
tests were conducted at the same levels for the test 
statistics tsum and tmax. 
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Results 

As expected, the Type I error rates (obtained under the 
null model) were near nominal levels and are, therefore, not 
shown. Because of similarities in the patterns of results, 
only power results obtained for a = .05 are presented. 
Figures 1 through 8 show power estimates plotted as a 
function of the number of dependent variables for the 
various tests obtained under the four treatment effect 
models described above, and for sample sizes of 10 and 25. 

Figures 1 and 2 show estimates obtained under the first 
treatment model, in which the treatment effect was constant 
for all dependent variables. All of the permutation tests 
were more powerful than Hotelling's t2 test across all 
numbers of dependent variables investigated. Note, 
particularly, that the power of Hotelling's test declines 
sharply as the number of variables approaches the number of 
subjects. When the nvimber of variables was 8 in Figure 1 or 
21 in Figure 2, the power of Hotelling's test was only 
slightly above a, demonstrating the near absence of 
sensitivity to the treatment effect in this condition. Also 
important is the fact that because n = 10 in Figure 1, 
Hotelling's T^ could not be computed when the number of 
variables was greater than 9. The corresponding effect 
occurs in Figure 2 (n = 25), when the number of variables is 
greater than 24. There is, of course, no such constraint on 
the permutation tests. In contrast to the pattern seen for 
Hotelling's t2, the permutation tests show relatively stable 
power across all numbers of variables. This stability, which 
is seen in all of the figures, is attributable to the fairly 
constant effect sizes used in modeling alternatives. 
Finally, as would be expected in this treatment effect, the 
one-sided permutation tests were more powerful than their 
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two>s Ided counterpart s . 

Figure 3 shows that, for the disordinal interaction 
type of treatment model, Hotelling*s test was more powerful 
than all of the permutation tests for the four dependent 
variable analysis, but fell below t|sum|, tmaxl, and tmax2 
when the number of variables was increased. This pattern is 
also seen in Figure 4 (n = 25) except that T^ remained more 
powerful than tmaxl at its upper variable limit. Note also 
in these two figures that tsuml and tsum2 have almost no 
sensitivity to this treatment effect, because of the 
canceling effect of opposite signs in the univariate t 
statistics referred to above. 

Figure 5 shows the results for the third treatment 
model, in which only 20% of the dependent variables are 
affected by the treatment. In this figure, T^ and tmaxl have 
similar power and are most powerful for the four dependent 
variable situation. When n is increased to 25 (Figure 6) , 
Hotelling's test is the most powerful in the analyses with 
4, 8, and 16 dependent variables. The decline in the power 
of t2 at its upper variable limit leaves tmaxl and tmax2 as 
the most powerful tests in this condition. 

Figures 7 and 8 show that all of the permutation tests 
are moro powerful than Hotelling's T^ test for all 
situations examined in this treatment effect (the model 
analogous to an ordinal interaction in ANOVA) . The power 
differences are especially substantial for the n = 25 
situation (Figure 8) . 

Discussion 

The multivariate permutation tests described and 
investigated in this research are not advocated as general 
substitutes for Hotelling's t2. Hotelling's test is familiar 
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to researchers, is easily calculated with available 
software, and h^s power advantages under certain conditions. 
However, multivariate permutation tests should be considered 
as valuable procedures that can be employed in situations 
where the test is suspect or is not calculable. 

Particularly notable is the characteristic decline in 
the power of as the number of dependent variables 
approaches the number of subjects. The implications for 
researchers using small samples is obvious. Small sample 
situations that are encountered in education" <. research 
include those in which the appropriate unit of analysis is 
the classroom or school, projects in >;hich resources for 
data collection are limited, and studies of relatively rare 
populations (such as autistic children or teachers of 
German) . 

Also of note is the fact that the proposed permutation 
tests are distribution-free under the same condition that 
the Wilcoxon signed-rank test is distribution-free 
(population symmetry about zero) . This condition is always 
met if, under a true null hypothesis, pretest and posttest 
samples are drawn from a common population (Bradley, 1968) . 
The distribution-free property is especially important in 
small sample situations, where the reliance on the central 
limit theorem is questionable. 

Finally, the permutation tests are constructed to be 
especially sensitive to specific types of treatment effects. 
For research situations in which the nature of the expected 
effects can be specified a priori, this aspect of the 
multivariate permutation tests provides a surprising level 
of statistical power with small samples and large numbers of 
dependent variables. 

Educational researchers face a variety of constraints 
in the conduct of empirical investigations (e.g., obtaining 
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the cooperation of research subjects and appropriate 
authorities, applying experimental treatments consistently 
and for a sufficient duration to obtain reliable outcome 
estimates) . Statistical constraints are recognized as 
vitally important in careful research design, because many 
conclusions drawn from the research results are based upon 
the outcomes of appropriate h^^othesis tests. 

Researchers should choose research questions and 
variables for investigation on the basis of substantive 
theory and not on the basis of constraints imposed by 
statistical models. The method of permutation tests is 
proposed as a feasible alternative to common multivariate 
statistical testing procedures which relaxes some of the 
statistical constraints faced by researchers. 
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Figure 1 

Power of Permutation Tests and Hotelling's T-square Test 
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Figure 2 

Power of Permutation Tests and Hotelling's T-square Test 
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Figure 3 

Power of Permutation Tests and Hotelling's T-square Test 



N = 1 0, Treatment Condition = 2 
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Figure 4 

Power of Permutation Tests and Hotelling's T-square Test 
N ---- ^5. Tr eatment Condition = ? 
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Figure 5 

Power of Permutation Tests and Hotelling's T-square Test 
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F igure 6 

Power of Permutation Tests and Hotelling's T-square Test 
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Figure 7 

Power of Permutation Tests and Hotelling's T-square Test 
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Figure 8 

Power of Permutation Tests and Hotelling's T-square Test 
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